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Abstract 

Background: Early stages of fruit development from initial set through exponential growth are critical determinants 
of size and yield, however, there has been little detailed analysis of this phase of development. In this study we 
combined morphological analysis with 454 pyrosequencing to study transcript level changes occurring in young 
cucumber fruit at five ages from anthesis through the end of exponential growth. 

Results: The fruit samples produced 1.13 million ESTs which were assembled into 27,859 contigs with a mean 
length of 834 base pairs and a mean of 67 reads per contig. All contigs were mapped to the cucumber genome. 
Principal component analysis separated the fruit ages into three groups corresponding with cell division/pre- 
exponential growth (0 and 4 days post pollination (dpp)), peak exponential expansion (8dpp), and late/post- 
exponential expansion stages of growth (12 and 16 dpp). Transcripts predominantly expressed at 0 and 4 dpp 
included homologs of histones, cyclins, and plastid and photosynthesis related genes. The group of genes with 
peak transcript levels at 8dpp included cytoskeleton, cell wall, lipid metabolism and phloem related proteins. This 
group was also dominated by genes with unknown function or without known homologs outside of cucurbits. A 
second shift in transcript profile was observed at 12-16dpp, which was characterized by abiotic and biotic stress 
related genes and significant enrichment for transcription factor gene homologs, including many associated with 
stress response and development. 

Conclusions: The transcriptome data coupled with morphological analyses provide an informative picture of early 
fruit development. Progressive waves of transcript abundance were associated with cell division, development of 
photosynthetic capacity, cell expansion and fruit growth, phloem activity, protection of the fruit surface, and finally 
transition away from fruit growth toward a stage of enhanced stress responses. These results suggest that the 
interval between expansive growth and ripening includes further developmental differentiation with an emphasis 
on defense. The increased transcript levels of cucurbit-specific genes during the exponential growth stage may 
indicate unique factors contributing to rapid growth in cucurbits. 
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Background 

Fleshy fruits are highly prized for nutritional content, fla- 
vor, fragrance, and appearance. While most fruits are 
eaten when ripe, a subset, including many that for culin- 
ary purposes are viewed as vegetables, are consumed im- 
mature. Cucumbers (Cucumis sativus), which are used as 
fresh product and processed into pickles, are typically 
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harvested at the middle or end of the exponential growth 
phase, 1-2 weeks post-pollination, and approximately 2- 
3 weeks prior to fruit maturation. 

Early fruit development is typified by phases of cell 
division and expansion [1]. In cucumber fruit, which de- 
velop from an enlarged inferior ovary, cell division 
occurs most rapidly prior to anthesis and then continues 
more slowly in the first 0-5 days post anthesis [2-5]. 
This phase largely overlaps with the period of highest 
respiration [4]. Fruit elongation begins almost immedi- 
ately after pollination, with the most rapid increase 
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occurring approximately 4-12 days post pollination 
(dpp) [6]. The rapid increase in cell size mirrors the 
rapid increase in fruit length, with obvious increase in 
vacuolization of mesocarp cells, and thickening in epi- 
dermal cell walls occurring between 8 and 16 dpp [6]. 
Cell division and expansion are largely completed by 12- 
16 dpp, with some variation depending on cultivar and 
season [4,6,7]. 

In addition to cell division and expansion, early devel- 
opment also includes specialized tissue and organ devel- 
opment and interaction with the abiotic and biotic 
environment. For example, developing cucumber fruit 
exhibit a distinct change in susceptibility to the soil- 
borne, oomycete pathogen, Phytophthora capsici; young 
fruit are highly susceptible, while older fruit are resistant 
[8,9]. There is a sharp transition in susceptibility that 
occurs at approximately 10-12 dpp coinciding with the 
end of the period of rapid fruit elongation. This age- 
related resistance suggests additional kinds of develop- 
mental changes occurring in the young cucumber fruit. 

Although a limited number of studies have examined 
gene expression during early fruit development, a picture 
reflecting cell division and expansion is beginning to 
emerge based on transcriptomic studies of apple, cucum- 
ber, grape, tomato and watermelon. Among the enriched 
categories associated with tomato fruit set, were genes 
associated with protein biosynthesis, histones, nucleo- 
some and chromosome assembly and cell cycle, suggest- 
ing a profile reflective of active cell division [10-12]. In 
contrast, various water, sugar and organic acid transport- 
associated genes were under-represented, but then 
increased with the transition from cell division to cell ex- 
pansion. Highly expressed categories of genes expressed 
in expanding cucumber, as well as apple, grape, tomato, 
melon and watermelon fruits, included cytoskeleton and 
cell wall modifying genes such as tubulins, expansins, 
endo-l,2-B-glucanase, beta glucosidases, pectate lyases, 
and pectin methylesterases, and transport associated 
genes such as aquaporins, vacuolar H+ATPases, and 
phloem-associated proteins [6,10,13-18]. The most 
highly represented transcripts in rapidly expanding cu- 
cumber fruit (8 dpp) also were strongly enriched for 
defense related homologs including, lipid, latex, and 
defense-related genes, e.g., chitinase, thionin, hevein, 
snakin, peroxidase, catalase, thioredoxin, and dehydrins 
[6]. 

The early stages of fruit development, including fruit 
set and exponential growth, are clearly essential for all 
fruits. However, despite their importance as determi- 
nants of fruit size and yield, there has been little detailed 
analysis of this phase of development. Most studies to 
date, including recent transcriptomic studies, have fo- 
cused on late development, or a broad range of develop- 
mental stages, with only a single snapshot during early 



development eg., [19-22]. In this study we combined 
morphological characterization with transcriptome ana- 
lysis to provide new insight into important early fruit 
developmental stages and processes. Our observations, 
performed at five time points during the period from 
fruit set through the end of exponential fruit growth, 
indicate that this is a dynamic period of cucumber 
fruit development involving an array of internal and 
external morphological, physiological, and transcrip- 
tomic changes that act in concert with phases of active 
cell division, expansion, and response to the environ- 
ment. Relative to anthesis and early fruit set, the 
period of peak- and late-exponential growth includes a 
large portion of highly represented transcripts, either 
of unknown function, or without homologs in Arabi- 
dopsis, suggesting unique factors contributing to the 
rapid growth phase in cucurbits. The end of exponen- 
tial growth was marked by a shift in transcriptome 
profile characterized by abiotic and biotic stress related 
genes and significant enrichment for transcription 
factor gene homologs associated with stress response 
and development, suggesting that the interval between 
expansive growth and ripening may include a pro- 
grammed transition toward enhanced defense. 

Results and discussion 

Morphological changes during early cucumber fruit 
development 

Young Vlaspik cucumber fruit followed a highly repro- 
ducible progression of growth and development includ- 
ing visible external and internal morphological changes. 
Increase in size occurred rapidly after fertilization with 
most rapid growth occurring between 4 and 12 dpp 
(Figure lA). After approximately 16 dpp, fruit size 
remained largely constant until fruit maturation at ap- 
proximately 30 dpp. At 0 dpp (anthesis), deep ridges 
along the length of fruit covered the surface of the fruit. 
Densely spaced spines were randomly scattered relative 
to the ridges (Figure IB). In contrast to ridges, which 
were most prominent at anthesis, warts, which are typic- 
ally are formed at the base of spines, were diminutive at 
0 dpp. They rapidly developed to become highly promin- 
ent at 4 dpp but then flattened out with further fruit ex- 
pansion. Both ridges and warts were nearly absent by 12 
dpp. The spines followed a maturation process culminat- 
ing in abscission. At 0 dpp spine color was translucent 
light green. At approximately 8 dpp they started to sen- 
esce, turning yellow, then white at 12-16 dpp. By 16 dpp 
many had abscised from the fruit surface. 

At anthesis, the exocarp was dark green. Dark green/ 
light green stripes and specks on the surface of the fruit 
began to emerge around 8 dpp. The fruit surface at an- 
thesis also has a dull appearance due to 'bloom' 
(Figure IB), a fine white powder primarily composed of 
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Figure 1 Growthi and development of cucumber fruits. (A) Increase in fruit lengtli and diameter as a function of days post pollination (dpp). 
(B) Changes in fruit surface during fruit growth including spine maturation and abscission, development and subsidence of warts, presence of 
'bloom' and loss of chlorophyll. (C) Micrographs showing changes in fruit surface with age (magnification 200x; staining was with Sudan IV). (D) 
Thickness of fruit pericarp and placenta. (E) Cross section of developing cucumber fruit at 0, 4, 8, 12 and 16 dpp. 



silica oxide (Si02) [23]. The bloom disappeared first 
from the peduncle end around 4 dpp, then the blossom 
end by 8 dpp; by 12 dpp, it had disappeared completely, 
leaving a shiny fruit surface. The cuticle layer showed 
increased thickness with age. After 12 and 16 dpp it 
stained more darkly with Sudan IV, indicating increased 
cutin or wax content that appeared to penetrate between 
the pallisade cells in the epidermal layer (Figure IC). 

With respect to internal fruit morphology, both pla- 
centa and pericarp rapidly expanded from 4-16 dpp. 
The rate and amount of expansion was very similar for 
both tissues (Figure ID). The mesocarp was initially 
green at 0 and 4 dpp, but became progressively lighter 
with age. Increase in mesocarp cell size is accompanied 
bv increased vacuolization between 4 and 12 dpp [6]. 
The placenta tissue became gelatinous between 8 and 12 
dpp and hardening of seed coats occurred between 12 
and 16 dpp (Figure IE). 



454 pyrosequencing data 

454 pyrosequencing analysis of cDNA libraries prepared 
from pericarp RNA samples of fruit harvested 0, 4, 8, 
12, and 16 days post-pollination provided 1.13 million 
reads (Additional file 1: Table SI). The resulting data 
were assembled into 27,859 contigs with a mean length 
of 834 base pairs (bp). All transcripts were mapped to 
the assembled cucumber genome of Huang et al. [24], 
although in some cases more than one transcript 
mapped to the same location. The number of the reads 
per contig ranged from 2 to more than 14,000 with a 
mean of 67 reads per contig and median of 7 reads/con- 
tig. Assembed contig length increased steadily with the 
number of ESTs/contig, until approximately 30 reads/ 
contig where it leveled off with an average length of ap- 
proximately 1400 bp (Additional file 2: Figure SI). Simi- 
larly, frequency of identification of homologs in 
Arabidopsis increased with number of ESTs/contig, 
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leveling off at approximately 90% with approximately 
30 reads/contig (Additional file 2: Figure SI). 

Gene ontology (GO) assignment to those contigs with 
putative homologs in Arabidopsis showed a similar dis- 
tribution of gene functions as are present in the full Ara- 
bidopsis genome (generally within 2-fold relative to the 
distribution in Arabidopsis), suggesting broad represen- 
tation of the genome (Additional file 2: Figure SI). Ap- 
proximately half of the contigs with >30 reads but 
without homologs in Arabidopsis had putative homologs 
in other species. The final portion, approximately 5% of 
the total (275 contigs), either did not have any identified 
homologs in the current NCBI nr database, or only had 
putative homologs in cucurbit species, suggesting that 
these transcripts may be unique to cucumber or cucur- 
bits relative to the plant species sequenced to date. These 
potentially cucurbit-unique transcripts included 91 very 
highly expressed contigs, represented by at least 100 
ESTs (average length >1000 bp). Eighteen had putative 
functional assignments, eight of which were known cu- 
curbit specific phloem-related proteins, such as phloem 
lectins and phloem proteins (Table 1). 

Changes in transcript abundance during early fruit growth 

Based on the observed relationship between ESTs/contig, 
contig length, and putative homologs in Arabidopsis 



(Additional file 2: Figure SI), subsequent bioinformatic 
analyses were performed on contigs represented by at 
least 30 ESTs. The distribution of contigs represented by 
at least 30 ESTs that did not have putative homologs 
outside of cucurbit species was not evenly distributed 
across fruit age (Figure 2A). The 8, 12, and 16 dpp librar- 
ies contained nearly twice as many contigs without iden- 
tified homologs in Arabidopsis as was observed for the 0 
and 4 dpp libraries. Of the 91 very highly abundant tran- 
scripts without known homologs outside of cucurbits, 
only three were not observed in the 8, 12 or 16 dpp sam- 
ples. In contrast, 17 of the cucurbit specific transcripts 
did not appear in 0 or 4 dpp samples. 

To validate usefulness of the 454 sequence data for ana- 
lysis of transcript abundance, a set of fourteen genes repre- 
senting different levels of EST representation/contig 
across the different fruit ages were selected for quantitative 
real time (qRT)-PCR analysis (Additional file 3 Figure S2). 
These included genes such as cyclin- dependent kinase 
B2;2 with high transcript levels early in development (0-4 
dpp) or expansin A5 with higher transcript levels at 8-16 
dpp. Comparison of transcript level at a given age relative 
to baseline expression at Odpp (56 gene/time comparisons) 
showed good correspondence between values obtained by 
454 sequencing and qRT-PCR (Pearsons correlation, 
R^ = 0.85; Additional file 3: Fig ure S2). There was also good 



Table 1 Functional annotation of transcripts represented by greater than 30 ESTs with homologs only Identified In 
cucurbit species 



No. reads 


Contig # 


Length (bp) 




Best BLASTX hit 


E value 


1,222 


2152 


815 


gi|22023939 


26 kDa phloem protein [Cucumis sotivus] 


2.0E-89 


886 


3111 


1,010 


gi| 1753099 


pinioem filament protein; PP1; phloem protein 1 [Cucurbito maximo] 


1 .OE-25 


546 


2219 


882 


gi|21 952270 


26 kDa phloem lectin [Cucumis sotivus] 


4.0E-83 


438 


1685 


625 


gi| 1669529 


CRG16 (gibberelin responsive) [Cucumis sotivus] 


4.0E-27 


315 


372 


825 


gi|21745319 


17 kDa phloem lectin [Cucumis sotivus] 


4.0E-62 


312 


1342 


678 


gi|21 952272 


phloem lectin [Cucurbito orgyrospermo subsp. sororio] 


8.0E-42 


257 


1071 


857 


gi|33415266 


poly(A)-binding protein C-terminal interacting protein 6 [Cucumis sotivus] 


1 .OE-72 


139 


4582 


965 


gi 121686470 


26 kDa phloem lectin [Cucumis melo] 


1 .OE-55 


130 


2648 


1,894 


gi|21 9567000 


galactose-binding type-2 ribosome-inactivating protein [Momordico chomntio] 


1.0E-130 


129 


2280 


668 


gi|94450551 


pathogen induced 4 protein [Cucumis sotivus] 


9.0E-32 


112 


5677 


1,666 


gi| 148270942 


expressed protein [Cucumis melo] 


1.0E-174 


77 


6868 


575 


gi|2406582 


pathogen-induced protein CuPil [Cucumis sotivus] 


4.0E-46 


62 


6937 


620 


gi|2576407 


seed nucell US-specific protein [Citrullus lonotus] 


2.0E-40 


54 


3108 


803 


gi|21745319 


17 kDa phloem lectin [Cucumis sotivus] 


5.0E-88 


51 


3170 


696 


gi|169219257 


putative Gly-rich RNA-binding protein [Cucumis sotivus] 


4.0E-49 


48 


1586 


901 


gi 121686470 


26 kDa phloem lectin [Cucumis melo] 


4.0E-31 


42 


2896 


563 


gi|51537955 


beta-caryophyllene synthase [Cucumis sotivus] 


2.0E-52 


37 


2099 


733 


gi|58263793 


profilin [Cucumis melo] 


1 .OE-29 


33 


5950 


1,288 


gi|28558780 


gag-protease polyprotein [Cucumis melo] 


1 .OE-80 
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Figure 2 Comparison of transcripts expressed in the cucumber fruit libraries. (A) Portion of contigs represented by at least 30 ESTs in a 
given fruit age library [0, 4, 8, 12, or 16 days post pollination (dpp)] that did not have putative homologs in Arabidopsis or other sequences 
present in the NCBI nr database. (B) Principal component analysis of transcripts expressed at the five different fruit ages. (C) Relationship between 
ages grouped by principal component analysis and fruit growth. (D) Venn diagrams showing genes commonly expressed among the three age 
groups. 



correspondence between the qRT-PCR results obtained 
from two different growth experiments in the greenhouse 
(R^ = 0.91), indicating biological reproducibility of patterns 
of gene expression across fruit ages, and validity of the use 
of frequency of EST representation in the 454 library as a 
measure of level of gene expression. 

Principal component analysis (PCA) was performed on 
transcript levels among the libraries from the five fruit 
ages (Figure 2B). The first two components, which 
accounted for nearly 90% of the variation, separated the 
fruit ages into three groups, 0 and 4 dpp, 8 dpp, and 12 
and 16 dpp. Examination of fruit growth rate indicated 
that these age groups correspond with cell division/pre-ex- 
ponential growth, peak exponential expansion, and late/ 
post- exponential expansion stages of growth, respectively 
(Figure 2C). Comparison of the transcripts present in each 
of the age groups showed that the great majority were 
detected in all three age groups. The fewest unique 



transcripts were present in the 8 dpp sample, consistent 
with a developmental gradient of transcription moving 
from 0-4 to 8 to 12-16 dpp. Both the PCA and Venn Dia- 
gram (Figure 2B,D) show the least commonality between 
the 0-4 and 12-16 dpp age groups. 

The most highly represented contigs in each age group 
(0.1% of transcript pool; 79, 111 and 107 contigs for 0-4, 
8, and 12-16dpp, respectively) exhibited markedly differ- 
ent profiles of putative gene function. Among those in 
common to all three groups were housekeeping genes in- 
cluding numerous ribosomal protein genes, and several 
tubulins, actins, and redox-related genes (catalase, ascor- 
bate oxidase, ascorbate peroxidase), as well as several 
with unknown function or no identifiable homolog in 
Arabidopsis. Examination of the transcripts that were 
very highly represented in only one age group (Table 2), 
showed that 0-4 dpp was the only one to include histone 
genes. This observation is consistent with the high level 
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Table 2 Cucumber fruit contigs very highly expressed in only one age group (>0.1% representation) at 0-4, 8, or 12-16 
dpp 



jntig # 


Length(bp) 


Reads 


Hit ID Arabidopsis 


Hit Description 


E value 






Total 


0+4 








3017 


2294 


1346 


1269 


AT1G08450 


CRT3 (CALRETICULIN 3); calcium ion binding/unfolded protein binding 


0.0 


2794 


1211 


1482 


1100 


AT4G35100 


PIP3 (PLASMA MEMBRANE INTRINSIC PROTEIN 3); water channel 


l.OE-139 


1157 


739 


1256 


792 


AT1G26880 


60S ribosomal protein L34 (RPL34A) 


2.0E-59 


2093 


1884 


1133 


721 


AT5G62700 


TUB3; GTP binding/GTPase/structural molecule 


0.0 


2075 


637 


1185 


657 


AT2G21580 


40S ribosomal protein S25 (RPS25B) 


3.0E-51 


886 


1337 


1195 


633 


AT3G55440 


TPI (JRIOSEPHOSPHATE ISOMERASE); triose-phosphate isomerase 


l.OE-119 


2327 


1622 


858 


615 


AT4G27440 


PORB (PROTOCHLOROPHYLLIDE OXIDOREDUCTASE B 


0.0 


3154 


1085 


1136 


611 


AT1G67430 


60S ribosomal protein LI 7 (RPL17B) 


l.OE-86 


885 


1370 


951 


605 


AT2G30620 


histone HI. 2 


1 .OE-66 


2351 


608 


1093 


579 


AT3G61110 


ARS27A (ARABIDOPSIS RIBOSOMAL PROTEIN S27) 


7.0E-46 


2804 


1198 


1072 


575 


AT3G04840 


40S ribosomal protein S3A (RPS3aA) 


l.OE-126 


2255 


888 


1075 


566 


AT4G40030 


histone H3.2 


4.0E-72 


192 


1383 


701 


522 


ATI Gl 8250 


ATLP-1; thaumatin-like protein 


l.OE-115 


2894 


744 


641 


518 


AT5G59970 


histone H4 


2.0E-53 


2685 


1054 


788 


497 


AT5G39850 


40S ribosomal protein S9 (RPS9C) 


l.OE-102 


606 


1477 


756 


492 


No hits found 






1038 


1121 


948 


484 


AT3G53740 


60S ribosomal protein L36 (RPL36B) 


8.0E-45 


820 


702 


783 


481 


AT5G50460 


protein transport protein SEC61 gamma subunit, putative 


9.0E-32 


107 


714 


670 


474 


AT5G59970 


histone H4 


1 .OE-53 


1260 


801 


867 


463 


AT5G56710 


60S ribosomal protein L31 (RPL31C) 


3.0E-54 


5501 


1223 


912 
Total 


459 
8 


AT3G52590 


UBQl (UBIQUITIN EXTENSION PROTEIN 1) 


1 .OE-68 


3064 


960 


1162 


776 


AT5G22430 


unl<nown protein 


1 .OE-25 


3334 


1146 


832 


505 


AT3G46040 


RPS15AD (ribosomal protein S15A D) 


2.0E-27 


1282 


680 


980 


490 


No hits found 






1650 


1644 


514 


404 


AT5G33370 


GDSL-motif lipase/hydrolase family protein 


l.OE-146 


3111 


1010 


886 


349 


No hits found 


phloem filament protein; PPl; phloem protein 7 [Cucurbita maxima] 


1 .OE-25 


424 


912 


1035 


331 


AT5G59880 


ADF3 (ACTIN DEPOLYMERIZING FACTOR 3); actin binding 


5.0E-63 


153 


715 


766 


317 


No hits found 






12937 


953 


418 


312 


AT4G 15630 


integral membrane family protein 


3.0E-37 


11169 


949 


792 


284 


AT1G28330 


DYLl (DORMANCY-ASSOCIATED PROTEIN-LIKE 1) 


3.0E-41 


2993 


1046 


494 


270 


No hits found 






2805 


767 


757 


267 


No hits found 






13496 


951 


687 


267 


No hits found 






2700 


654 


428 


263 


AT5G38650 


proteasome maturation factor UMPl family protein 


2.0E-61 


2357 


598 


526 


261 


No hits found 






1468 


899 


821 


244 


ATI G5 1200 


zinc finger (AN 1 -like) family protein 


8.0E-53 


2350 


973 


551 


242 


ATI Gl 1530 


ATCXXSl; protein disulfide isomerase 


2.0E-37 


3297 


1632 


729 


242 


AT3G26960 


unknown protein 


8.0E-35 


3067 


1145 


744 


240 


AT2G 10940 


protease inhibitor/seed storage/lipid transfer protein (LTP) family protein 


5.0E-61 


342 


928 


717 


225 


AT3G26960 


unknown protein 


2.0E-43 


456 


830 


645 


223 


No hits found 
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Table 2 Cucumber fruit contigs very highly expressed in only one age group (>0.1% representation) at 0-4, 8, or 12-16 



dpp (Continued) 



485 


1612 


919 


220 


AT1G09690 


60S ribosomal protein L21 (RPL21C) 


l.OE-81 


924 


1025 


673 


214 


AT4G37300 


MEE59 (maternal effect embryo arrest 59) 


5.0E-24 


1445 


1372 


514 


203 


AT5G65020 


ANNAT2 (Annexin Arabidopsis 2); calcium ion binding 


l.OE-131 


620 


1165 


472 


199 


AT3G10210 


unknown function 


1 .OE-94 


2771 


1339 


491 


195 


No hits found 






489 


829 


799 


192 


AT1G79040 


PSBR (photosystem II subunit R) 


4.0E-56 


3147 


950 


593 


192 


AT4G 14305 


unknown function 


1 .OE-62 






Total 


12+16 








1561 


899 


1684 


1221 


No hits found 






504 


1593 


1167 


855 


AT5G47120 


ATBIl (BAX INHIBITOR 1) 


l.OE-109 


2496 


1641 


1319 


810 


AT2G02760 


ATUBC2 (UBIQUITING-CONJUGATING ENZYME 2) 


9.0E-86 


2811 


1259 


1273 


694 


AT1G07890 


APXl (ascorbate peroxidase 1); L-ascorbate peroxidase 


l.OE-119 


493 


3235 


1307 


679 


No hits found 


hypothetical protein [Vitis vinifera] 


8.0E-37 


1702 


874 


685 


607 


AT5G59720 


HSP18.2 (heat shock protein 18.2) 


2.0E-69 


2864 


1409 


870 


578 


AT2G43750 


OASB (0-ACETYLSERINE aHIOL) LYASE B); cysteine synthase 


l.OE-142 


1551 


2577 


765 


558 


AT5G19150 


carbohydrate kinase family 


l.OE-135 


54 


1237 


1084 


538 


AT2G23090 


unknown protein 


4.0E-35 


410 


1047 


1055 


518 


ATI Gl 3950 


ELF5A-1 (EUKARYOTIC ELONGATION FACTOR 5A-1) 


6.0E-79 


296 


1842 


750 


516 


AT3G44110 


ATJ3; heat shock protein DNAJ homolog, protein binding 


0.0 


281 


1330 


1094 


532 


AT5G20720 


CPN20 (CHAPERONIN 20); calmodulin binding 


6.0E-98 


2649 


1071 


1017 


502 


AT2G21660 


CCR2 (COLD, CIRCADIAN RH>THM, AND RNA BINDING 2) 


7.0E-44 


2541 


2197 


740 


483 


AT3G48000 


ALDH2B4; 3-chloroallyl aldehyde dehydrogenase 


0.0 


698 


1330 


950 


477 


AT5G22080 


DNAJ heat shock N-terminal domain-containing protein 


l.OE-112 


1568 


1292 


515 


470 


AT5G 12020 


HSPl 7.611 (17.6 KDA CLASS II HEAT SHOCK PROTEIN) 


7.0E-49 


1606 


1153 


687 


455 


AT2G43750 


OASB (0-ACETYLSERINE aHIOL) LYASE B); cysteine synthase 


l.OE-121 


2828 


1099 


640 


435 


ATI G5 1200 


zinc finger (AN 1 -like) family protein 


4.0E-51 


2414 


869 


613 


432 


AT3G 16640 


TCTP aRANSLATIONALLY CONTROLLED TUMOR PROTEIN) 


2.0E-70 


2054 


595 


551 


427 


No hits found 






2219 


882 


546 


411 


No hits found 


26 kDa phloem lectin [Cucumis sativus] 


4.0E-83 



of DNA replication occurring in young, dividing cells. 
Similarly, ribosomal protein genes were among the most 
highly represented transcripts at 0-4 dpp (8/21 genes) 
but minimally present in the 8 or 12-16 dpp ages (2 and 
0 times, respectively). The 12-16 dpp group was marked 
by numerous abiotic stress related genes. 

Strikingly, genes with unknown function or without 
Arabidopsis homologs, dominated the group at 8 dpp, 
accounting for more than half of the contigs (14/27 
genes, 52%). 

The exponential growth stage of tomato also was asso- 
ciated with a larger proportion of ESTs with unknown 
function relative to other ages [10]. Fewer genes with un- 
known function or without Arabidopsis homologs oc- 
curred in the 12-16 dpp group (5/21) and only 1 
member of the 0-4dpp group had no assigned putative 
function or was without a homolog in Arabidopsis. 



To identify less highly represented genes that were 
strongly enriched at a specific age group, contigs were 
normalized for portion of reads observed at different 
time points. If transcription levels were constant during 
development, 20% of the transcript reads would be 
observed at each of the five sample ages (i.e., 40% for 0 
+4dpp, 20% for 8dpp and 40% for 12+16dpp). Overall 
distribution of portion of transcripts observed at a given 
age followed this expectation for the transcriptome set, 
with a mean value of 41.05%, 19.77%, and 39.18%, re- 
spectively for 0+4, 8, and 12+16dpp age groups 
(Figure 3A). The tails of the distribution (top 2.5%) were 
examined for genes for which transcript levels were 
strongly enriched in a specific age group (approxi- 
mately 120 genes/group). This resulted in three non- 
overlapping sets of genes (Additional file 4: Table S2). 
There also was minimal overlap with the genes listed in 
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Functional enrichment of top 2.5% group of contigs at different ages 



Gene Ontology category 
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8 


12+16 


cell organization and biogenesis 
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0.38 


cell 


4.45 


1.62 
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DNA or RNA metabolism 
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0 


plastid 


2.39 
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0.45 


cell wall 
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1.42 


lipid metabolism 


2.94 
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0.61 


extracellular 


2.69 
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3.36V 


minor CHO metabolism 


3.94 


2.7 


8.19 ^ 


transcription factor 


0.98 


1.34 


2.48 


development 


0.92 


1.27 


3.21 


stress 


1.2 


0.54 


5.6 


response to abiotic or biotic stimulus 


2.45 


2.55 


5.69 



Figure 3 Portion of gene expression observed in each age group and biological enrichment analysis. (A) Distribution of tine portion of 
gene expression observed at eacli age group for all contigs with >30 reads. Shading represents those contigs most strongly expressed for each of 
the age groups (top 2.5%). (B) Biological enrichment analysis of contigs with age group-enriched expression as identified in (A). Functional 
distribution, normalized frequency, and bootstrap standard deviation (SD) of contigs with putative Arabidopsis homologs was determined using 
the categories classification from the Classification SuperViewer from Bio-Array Resource for Arabidopsis Functional Genomics for Gene Ontology 
[25]. Shading indicates those categories that are significantly enriched (P<0.05). 



Table 2. The genes listed in Table 2 had an average of 
862 ESTs/contig whereas the mean number of ESTs/con- 
tig for the genes identified in this manner was 166. As 
was seen for the most highly represented group of genes, 
there was uneven distribution of genes without homo- 
logs in Arabidopsis or with unknown function; those 
accounted for 18.3% for 0+4 dpp enriched transcripts, 
but for 34.1% and 33.8% for 8dpp and 12+16 dpp 
enriched transcripts, respectively. 

Fruit set/pre-exponential growth 

Functional enrichment analysis of those transcripts with 
age-group enriched transcript levels indicated that the 0-4 
dpp age group had significantly increased representation 
of genes associated with cell organization and biogenesis, 
and DNA or RNA metabolism, that subsided with age 
(Figure 3B). In addition to histone genes, which were also 
among the most highly abundant transcripts for this age 
group (Table 2), numerous putative cell cycle genes. 



cyclin- and cyclin dependent kinase-related gene family 
members, exhibited greater than 90% of transcript reads at 
0-4 dpp (Additional file 4: Table S2, Figure 4A). Extensive 
protein interaction and gene expression data from Arabi- 
dopsis have allowed for the development of a picture of 
the cyclin interactome, including characterization of com- 
plexes associated with different cell phases [26]. Cyclin 
related genes strongly enriched at 0-4 dpp in the cucum- 
ber fruit transcriptome, such as putative homologs of 
CDKB1;2, CDKB2;2, CYCB1:2; CYCD3;1, CYCD3;3, 
CYCD5;1, were among those associated with the mitosis 
and post- mitosis (M and Gl) phases in the Arabidopsis 
interactome. Elevated expression of several of these genes 
was also observed during fruit set in pollinated vs. unpolli- 
nated apple and cucumber flowers [3,27]. In contrast, the 
homolog of CDKA;1 [TAIR:At3g48750], which was uni- 
formly represented in the young cucumber fruit transcrip- 
tome, was associated with cyclin complexes throughout 
the Arabidopsis cell cycle. 
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Days post pollination 

Figure 4 Functional groups of genes showing age-specific expression. (A) Expression of cyclin related genes relative to fruit age plotted as 
percent total expression for that transcript observed at each age [putative homologs of cyclins Bl;2 (At5g06150), B1;4 (At2g26760); Dl;l 
(Atlg70210);, D3;l (At4g34160), D3;3 (At3g50070), D5;l (At4g37630), and cyclin dependent kinases (CDK), CDKB1;2 (At2g38620), CDKB2;2 
(Atlg20930), CDKD1;3 (Atlgl8040), CDKE;1 (At5g63610); CKSl (At2g27960)]. (B) GDSL-motif lipase/hydrolase family protein genes (putative 
homologs of Atlg09390, Atlg56670, At2g04570, At2g42990, At3g 16370, At3g48460, At5g036ia At5g 14450, At5g33370, At5g62930) and 
transcription factor SHINE! (At 1 gl5360; dotted line). (C,) Lipid transfer protein (LTP) family protein genes (putative homologs of Atlg48750, 
Atlg62510, At2g 10940, At2g45180, At5g01870, At5g64080). (D) Phloem proteins. Solid lines indicate cucurbit specific phloem proteins as listed in 
Table 1. Dotted lines indicate putative homologs to Arabidopsis phloem proteins (ATPP) 2-A genes (two for ATPP2-A1, one for ATPP2-A9, and 
ATPP2-A13); dashed lines are putative homologs of ATPP2-B genes (ATPP-BIO and ATPP-B12). (E) Transcription factors showing preferential 
expression at 12 +16 dpp (putative homologs of Atlg27730, Atlg50640, At2gl7040, At2g26150, At2g40140, 2 g46240, At3gl5210, At3gl5510, 
At3g 16770, AT3g56400, At4gl 1660, At4g 16780, At4g39250, At5g25560, At5g51 190). 



The categories of plastid and chloroplast also were sig- 
nificantly enriched in the 0-4 dpp group, then declined 
with age. This is consistent with the decrease in chloro- 
phyll observed after 4dpp; chlorophyll content per gram 
fresh weight peaked at 4 dpp, and then decreased until 12 
dpp (Figure 5A). The assembled contigs included 91 tran- 
scripts whose homologs in Arabidopsis had annotations 
including one or more of the following terms: chlorophyll. 



chloroplast, photosystem, or thylakoid (Additional file 5: 
Table S3). Overall patterns of transcript abundance for 
these genes paralleled chlorophyll content in the develop- 
ing fruit (Figure 5B). 

K-means cluster analysis allowed for further identifica- 
tion of transcripts showing progressive patterns of repre- 
sentation with fruit age (Figure 6). The chloroplast and 
other photosynthesis related genes described above. 
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Days post pollination 

Figure 5 Chlorophyll content and expression of chlorophyll and 
chloroplast-related transcripts in relationship to fruit age. (A) 

Chlorophyll content. (B) Expression of chlorophyll and chloroplast- 
related transcripts in relationship to fruit age. Gene expression is 
plotted as percent of total expression observed for that transcript at 
each age. The heavy white line represents average percent gene 
expression at each age for the 91 genes (Supplemental file 2) with 
homologs in Arabidopsis annotated to be associated with 
chlorophyll or chloroplasts. 



along with homologs of at least 10 additional chloroplast 
located proteins and enzymes predominated in the 4 or 
4-h8 clusters, but were minimally observed in the 0, 0-h4, 
or 8 dpp clusters, and did not appear in the later clusters 
(Figure 6, Additional file 6: Table S4). 

Exponential growth 

The group of genes with peak abundance at the 8 dpp, ex- 
ponential growth stage, included cytoskeleton, cell wall, 
and water and carbohydrate transport genes. Tubulins, 



actin-related proteins, extensins, expansins, cellulose 
synthases, pectinase modifying enzymes, aquaporins, vacu- 
olar H-hATPases, and phloem filament and lectin proteins, 
were among those strongly represented, as has been 
observed for other rapidly growing fleshy fruits such as to- 
mato, apple, grape, and watermelon [10,13-15,17,18]. The 
major latex protein related genes also exhibited peak levels 
at 8 dpp, including two extremely highly transcribed genes 
that together accounted for more than 17,000 reads (Add- 
itional file 6: Table S4). 

Putative homologs of vacuolar ATP synthase subunits B, 
D, H and P2 [TAIR:At4g38510, At3g58630, At3g42050, 
Atlgl9910] showed coordinate transcript abundance, with 
comparable levels increasing steadily until 8 dpp, and then 
gradually declining Two very highly represented homologs 
of the vacuolar aquaporin gene [TAIR:At2g36830], gamma 
tip tonoplast intrinsic protein, also peaked at 4-8 dpp (Add- 
itional file 6 Table S4). 

All of the cucurbit specific phloem proteins listed in 
Table 1 and the four putative homologs of the Arabidop- 
sis phloem protein (ATPP) A2 family members observed 
in the data set peaked somewhat later, at 8-16 dpp with 
minimal transcript levels at 0 and 4 dpp (Figure 4D). 
Cucurbits are characterized by a unique and functionally 
divergent network of extrafascicular phloem external to 
the vascular bundles [28-30]. The highly expressed pro- 
teinaceous phloem filaments, comprised of the cucurbit- 
specific PPl proteins, and the more widely distributed 
PP2 phloem lectin proteins [31], were found to be pri- 
marily associated with the extrafasicular phloem [30]. 
Strong expression of phloem protein genes during rapid 
growth has been observed in other studies, including 
PPl expression in green stage watermelon fruit 
[18,31,32]. Specific expression of PP2 (a group A mem- 
ber [31]) was observed in young pumpkin {Cucurbita 
pepo) hypocotyls, peaking at 12 days after germination in 
concert with the period of peak growth and vascular dif- 
ferentiation [32]. In contrast, cucumber homologs of the 
ATPP2-B family had a nearly inverted pattern of tran- 
script levels relative to PP2-A genes, peaking at 0 dpp, 
and dropping during exponential growth, suggesting pos- 
sible functional divergence (Figure 4D). 

The period of rapid fruit enlargement was also asso- 
ciated with marked changes in fruit surface, including an 
increase in cuticle thickness as is typically observed dur- 
ing rapid plant growth [33], and loss of the silica oxide 
powder based 'bloom'. The homolog of the Cucurbita 
moschata silicon transporter [GenBank:327 187680; ref. 
23] showed age specific transcript abundance peaking at 
8 dpp then dropping sharply, coinciding with the time of 
bloom loss from the middle of the fruit (the region from 
which samples were taken). 

Among the genes identified in other systems to be asso- 
ciated with cuticle biosynthesis are the extracellular GDSL 
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Figure 6 Profiles of cucumber fruit transcripts showing age- 
specific expression patterns as determined by K-means 
analysis. Analyses were performed using Cluster 3.0 software [25]. 



motif lipase/hydrolase proteins and lipid transfer proteins, 
which have been implicated in lipid transport to extracel- 
lular surfaces [33-36]. The cucumber fruit transcriptome 
set included eleven GDSL motif lipase/hydrolase protein 
family members that were represented by at least 30 ESTs, 
including five with more than 100 ESTs. The majority 
showed peak levels at 8 or 12-16 dpp, with virtually no 
measured reads until either 8 or 12 dpp (Figure 4B). 
Twelve lipid transfer protein (LTP) family members with 
greater than 30 ESTs/contig also were observed in the 
transcriptome data set, including four with greater than 
700 ESTs. As for the GDSL motif lipase /hydrolase protein 
genes, the majority of the lipid transfer proteins were most 
highly represented from 8-16 dpp; transcript levels of one 
gene peaked at 4-8 dpp (Figure 4C). 

A homolog of the transcription factor gene SHINEl 
[TAIR:Atlgl5360], which is associated with cuticle produc- 
tion in Arabidopsis (Figure 4B) [37] also exhibited peal< 
transcript abundance at 8 dpp. Additionally, transcript levels 
of two cyctochrome P450 family members (CYP86A and 
CYP77A) that have been associated with cutin biosynthesis 
[38]; and two putative beta amyrin synthases, enzymes 
which have been associated with cuticular wax synthesis in 
tomato [39], also peaked at 8dpp (Additional file 4: Table 
S2). In contrast, two putative GDSL family members and 
one lipid transfer protein with moderate transcript levels 
(45-55 ESTs) [homologs of TAIR:At5g62930, At5g03610, 
and At2g45180, respectively] were observed almost exclu- 
sively at 0 dpp, suggesting possible floral, rather than fruit, 
expression (Additional file 6: Table S4). 

Late/post exponential growth 

Stress-related genes (response to stress and response to 
abiotic and biotic stimulus categories) were over- 
represented at all stages, but considerably more so at 12- 
16 dpp than at the younger ages of 0-4 and 8 dpp 
(Figure 3B). The 12+16 dpp age group had the highest rep- 
resentation of abiotic and biotic stress related genes, in- 
cluding a variety of heat shock, redox, biotic defense and 
ethylene -related transcripts (Additional file 4: Table S2). 
Of the 120 genes in this group, 44 have high homology 
with genes associated with plant stress, including at least 
13 transcription stress-related factors such as WRKY70 ac- 
tivator of SA-dependent defense; radical induced cell 
death; ethylene response, salt stress, and heat shock tran- 
scription factors (Figure 4E; Additional file 4: Table S2). 

Overall, the group of genes with peak abundance at 12 
+16 dpp was significantly enriched for transcription fac- 
tor genes (2.48-fold enrichment normalized frequency 
relative to Arabidopsis; P value = 3.19, E-04) (Figure 3B) 
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accounting for 16% of the top 2.5% set. This may be con- 
trasted with the total cucumber fruit transcriptome data 
set where transcription and transcription factor activity 
related genes were represented at a normalized frequency 
of 0.94 relative to occurrence in the Arabidopsis genome. 
Transcription factors in the top 2.5% of 0+4 and 8 dpp 
groups also were represented at a comparable frequency to 
the Arabidopsis genome, accounting for 3.7% and 4.6% of 
the gene list, respectively. 

In addition to the stress related transcription factors 
with specific representation at 12-16 dpp, several puta- 
tive transcription factor homologs were annotated to be 
associated with development [e.g., embryo sac develop- 
ment {BELl-LIKE HOMEODOMAIN 1), morphogenesis 
(anac036/NAC domain containing protein 36), and cell 
expansion (ATHB-2 homeobox protein) (Additional file 
4: Table S4). Furthermore, transcripts of other genes with 
homologs that have been implicated in development 
related processes are specifically observed at 12-16 days, 
such as putative homologs of TCTP (TRANSLATION- 
ALLY CONTROLLED TUMOR PROTEIN); BTB AND 
TAZ DOMAIN PROTEIN I; calcium-binding EF hand 
family protein; seed development related {E12A11); and 
BAX INHIBITOR 1. 

Conclusions 

Examination of early cucumber fruit growth from the 
period of pollination and initial fruit set through the end 
of the exponential growth phase shows a dynamic series 
of physiological and morphological changes (Figure 7). 
Transcriptomic analysis of the predominant genes repre- 
sented in the different age groups as identified either by 
total number of reads (most highly represented among 
the genes at that age), portion of transcript reads 
observed at that age, or genes grouped by K-means clus- 
ter analysis, told a story aligned with the sequential 
stages of development. 

Transcript representation in the youngest ages, 0-4 dpp, 
was uniquely characterized by genes associated with cell 
division, cell organization and biogenesis. At 4 dpp, tran- 
scription of the cell cycle genes was declining, while 
chloroplast, photosynthesis, and chloroplast-localized 
genes were peaking. Transcripts highly abundant during 
the exponential growth phase, 4-12 dpp, included exten- 
sive representation of genes associated with cell structure 
such as cytoskeleton, vacuoles and cell walls, along with 
surface lipid metabolism related genes, in concert with the 
period of greatest increase in cuticle thickness. 

A second shift in the transcriptome profile was 
observed at 12-16 dpp with significant enrichment of 
abiotic and biotic stress related genes and stress -related 
and developmental transcription factor gene homologs. 
The enriched representation of numerous transcription 
factors relative to earlier ages suggests a programmatic 



change away from fruit growth, toward defense, and ul- 
timately fruit maturation. This is also the time period 
where we have observed transition of cucumber fruit 
from susceptibility to resistance to P. capsici [8,9]. Clas- 
sically, fleshy fruit development is described to consist 
of three stages post pollination: cell division, cell expan- 
sion, and ripening [1]. These results suggest that the 
interval between expansive growth and ripening may in- 
clude further developmental differentiation; an emphasis 
on defense would be consistent with the role of fruit in 
protecting the developing seeds during embryo matur- 
ation prior to facilitating seed dispersal. 

Finally, approximately 5% of the contigs represented by 
>30 reads either did not have identified putative homologs, 
or did not have homologs outside of cucurbits suggesting 
potentially unique genes specific to cucumber or cucur- 
bits. The observation that these genes, as well as genes 
with homologs but with no annotated function, rarely oc- 
curred in the 0-4 dpp group, suggests commonality 
among processes associated with early fruit set and cell 
division and/or greater knowledge about the fruit set stage. 
The predominance of transcripts without non-cucurbit 
homologs or with unknown predicted functions during 
the peak exponential growth stage may reflect fewer stud- 
ies to date about this phase of growth, or unique adapta- 
tions of cucurbits to allow for extreme fruit growth rates 
associated with these species. 

CoUectively, the transcriptomic information provided 
by the young cucumber fruit samples coupled with mor- 
phological analyses provide an informative picture of 
early fruit development characterized by phases of active 
cell division, fruit expansion including novel or unchar- 
acterized genes, and response to the environment, as 
summarized in Figure 7. The progressive modules of 
transcript abundance tell a story of cell division, develop- 
ment of photosynthetic capacity, cell expansion and fruit 
growth, phloem activity, protection of the fruit surface, 
and finally transition away from fruit growth toward 
defense and maturation. 

Methods 

Plant material, fruit growth, chlorophyll and cuticle 
measurements 

Sets of 80 cucumber plants per experiment (pickling 
type, cv. Vlaspik; Seminis Vegetable Seed Inc, Oxnard, 
CA) were grown in the greenhouse in 3.78 L plastic pots 
fiUed with BACCTO (Michigan Peat Co., Houston, TX) 
media and fertilized once per week. Temperature was 
kept between 21 to 25°C, supplemental lights were used 
to provide an 18 h light period. Pest control was per- 
formed according to standard management practices. All 
flowers for each experiment were hand pollinated on a 
single date (1-2 flowers per plant). The experiment was 
repeated three times. Prior to the harvests, which were 
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Figure 7 Schematic representation of morphological and gene expression changes occurring during early cucumber fruit development. 

Gene expression data refer to periods of peak expression for indicated gene categories. Data for respiration, cell division, and susceptibility to P. 
copsici are from Marcelis and Hofman-Eijer [4], Colle et al. [7], and Gevens et al. [9], respectively. 

V J 



performed at 4 day intervals from 0-16 dpp, fruit were 
measured for length and diameter, and examined for ex- 
ternal appearances including: presence or absence of wax 
along the length of the fruit; wart development; color 
patterns (e.g., stripes); and changes in presence, color, 
and densities of spines. Pericarp and placenta size was 
measured from the cross section of the fruit after 
harvest. 



Exocarp samples (upper 1 to 2 mm) for chlorophyll 
measurement were removed by fruit peeler from the 
center portion of five fruit at each age and stored at - 
20°C. Samples were subsequently thawed at room 
temperature and blotted on paper to remove excess 
water and 1 g gram portions were immersed in N, 
A/'-dimethylformamide for at least 24 hours at 4°C in 
dark. Total chlorophyll was calculated based on 
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spectrophotometer absorbance measurements at 665 and 
647 nm [40]. Samples to measure cuticle thickness were 
stained with Sudan IV(as per [41]) and measured using a 
Spot RT3 Digital Camera System at 200x magnification 
(SPOT Imaging Solutions, Diagnostic Instruments, Inc., 
MI). 

cDNA library production and 454 sequencing 

Randomly assigned groups of twenty fruit were harvested 
at 0, 4, 8, 12, and 16 dpp and ranked by size; the middle 
ten fruits were used for RNA extraction. Pericarp sam- 
ples consisting of exocarp, mesocarp, and placenta tissue 
but not seeds, were isolated from the center portion of 
the fruit by razor blade, immediately frozen in liquid ni- 
trogen, and stored at - 80°C until RNA was isolated. 
Samples from ten fruits were pooled for RNA extraction; 
RNA and oligo(dT) -primed cDNA sample preparation 
were based on the procedures of Schilmiller et al. [42] 
and Ando and Grumet [6]. Final concentration was 
assessed by the nanodrop ND-1000 method and subse- 
quent steps for 454 Titanium (0, 4, 1,2, 16 dpp) pyrose- 
quencing analysis were performed by the Michigan State 
University Research Technology Support Facility (RTSF). 
Each sample was loaded on a 1/4 plate 454 Pico Titer- 
Plate (454 Life Sciences, a Roche Corporation, CT). The 
8 dpp sample was sequenced previously [6]. 

Contig assembly and gene annotation 

Contigs were assembled by the MSU RTSF Bioinformatics 
Group. Reads were processed through The Institute for 
Genomic Research (TIGR) SeqClean pipeline to trim re- 
sidual sequences from the cDNA preparation, poly(A) tails 
and other low quality or low complexity regions [43]. 
Trimmed sequences were assembled into contigs using 
the TIGR Gene Indices Clustering Tools (TIGCL) [44]. 
Stringent clustering and alignment parameters were used 
to limit the size of clusters for assembly. Contigs from the 
first pass of assembly were then combined and subjected 
to a second assembly pass with CAPS [45]. Less stringent 
alignment parameters were used for this pass to allow for 
minor sequencing errors or allelic differences in the cDNA 
sequence. Read data for 8 day post pollination samples is 
available from the Sequence Read Archive (SRA), access- 
ible through NCBI BioProject ID PRJNA79541. Read data 
for 0, 4, 12 and 16 dpp samples in SRA as well as 
assembled contig sequences deposited as Transcriptome 
Shotgun Assemblies (TSA) and expression profiling data 
in the Gene Expression Omnibus (GEO) are available 
through NCBI BioProject ID PRJNA169904. 

To estimate relative expression, the number of reads ori- 
ginating from each cDNA library were counted for each 
contig and reported relative to the total number of reads 
generated for that library as transcripts per thousand 
(TPT). The final contigs were subjected to BLASTX 



search against the green plant subdivision of the NCBI nr 
protein database and/or the Arabidopsis protein (TAIR9) 
databases to search for similarity to previously identified 
genes and assign possible gene functions. BLASTN ana- 
lysis was performed for highly expressed contigs for which 
homologs were not identified by BLASTX searches. 

Transcriptome analysis 

The Classification SuperViewer Tool w/Bootstrap web 
database [25] was used for GO categorization, determin- 
ation of normalized frequencies relative to Arabidopsis, 
and calculation of bootstrap standard deviations, and P- 
values. Princomp procedure SAS 9.1 (SAS Institute, Cary, 
NC) was used for principal component analysis. The first 
two principal components, which explain nearly 90% of 
the total variation were extracted from the covariance 
matrix. To examine relative gene expression at each age, 
the portion of reads for that transcript relative to total 
reads for the transcript, was calculated for each transcript 
with >30 reads, for each age. Expression profiles were clus- 
tered by K- means method using Cluster 3.0 software [46]. 

qRT-PCR 

Total RNA was isolated and assessed for quality and 
quantity as above. RT reactions were performed using 
the High Capacity RNA-to-cDNA kit (Applied Biosys- 
tems, Foster City, CA). Gene-specific primers (Add- 
itional file 7: Table S5) were designed using Primer 
Express software. ABI Prism 7900HT Sequence Detec- 
tion System was used for qRT-PCR analysis. Power SYBR 
Green PCR Master Mix (Applied Biosystems) was used 
for PCR quantification. Actin from C. sativus was used 
as an endogenous control and for normalization. Each 
qRT experiment was repeated three times. PCR products 
from each gene were quantified with reference to corre- 
sponding standard curves. 
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Additional file 1: Table 51. Summary of 454 sequencing results and 
contig assembly for cucumber fruit libraries 0-16 days post pollination. 

Additional file 2: Figure SI. Relationship between number of ESTs per 
contig, mean contig length, and percent of contigs with homologs in 
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Additional file 3: Figure S2. qRT-PCR verification of gene expression 
changes. 

Additional file 4: Table S2. Genes differentially expressed at 0-4, 8, or 
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